Creation of a PG critic script. #1254

pstaabp · 2025-06-20T19:20:45Z

The pg-critic script analyzes a pgproblem (or list of them) for both good and bad features of a problem. Currently this include:

Positive features:

Uses PGML
Provides a solution
Provides a hint
Uses Scaffolds
Uses a custom checker
Uses a multianswer
Uses answer hints
Uses nicetables

Old and deprecated features

checking for deprecated macros
Use of BEGIN_TEXT/END_TEXT
Include the TEXT(beginproblem)
Include old tables (for example from unionTables.pl)
The use of num_cmp, str_cmp and fun_cmp in lieu of using MathObjects
Including Context()->TeXStrings
Calling loadMacros more than once.
Using the line $showPartialCorrectAnswers = 1 which is the default behavior and thus unnecessary.
Using methods from PGchoicemacros.pl
Including code or other text below the ENDDOCUMENT(); line indicating the end of the problem.

Currently the script can also score a problem (there is a rubric that is built-in to the script). This can be made more flexible.

Also, there are plans to include more features on both the positive and negative side.

drgrice1

Please set the executable bit on the pg-critic.pl script.

pstaabp · 2025-06-27T21:04:32Z

This improves the pg-critic script. Additional positive features are now scanned for including features in modern context, parsers and macros.

The script also now updates the help page for the script and provides a JSON output format which writes to a file--this might be helpful for scanning large numbers problems.

duffee · 2025-07-06T09:27:22Z

Nice work!

A suggestion for an optional critic policy is the amount of randomness available in the question. An anti-pattern I've seen by authors in a hurry (Contrib/BrockPhysics/College_Physics_Urone) is

$A1 = 63.5;
ANS(num_cmp("$A1"));

usually the result of values hardcoded in an image or just copied straight from an Answer Key. Other problems aside, this pgproblem has no randomness at all. It would be nice to know:

if there is randomness in the answers
estimate how many permutations available
range of answers (because I can call random, but not use it in a calculation)

It's more a pedagogic issue than authoring style, but the first two should be a Simple Matter of Programming ;)

pstaabp · 2025-07-07T14:51:11Z

Since this script just analyzes the code and doesn't run it, I think it will be difficult to test randomness of a problem by evaluating the problem for multiple seeds. We've talked about doing this at some point.

We could add a check for calling random or related methods within a problem. However, the following

$a = random(1,10);
$ans = Real(5);
BEGIN_PGML
[_]{$ans}
END_PGML

which would detect as using random, but won't be random.

Alex-Jordan · 2025-07-07T15:51:02Z

A related note: randomness in *answers* is just part of the story. Exercise statements can have lots of randomness even when all seeds lead to the same answer. For example, give three random vectors, designed so that they have rank 2, and ask for the dimension of the space they span. The answer would always be 2, but it's still a good randomizable exercise. Alex Jordan Mathematics Instructor Portland Community College

…

On Mon, Jul 7, 2025, 7:51 AM Peter Staab ***@***.***> wrote: *pstaabp* left a comment (openwebwork/pg#1254) <https://urldefense.com/v3/__https://github.com/openwebwork/pg/pull/1254*issuecomment-3045503860__;Iw!!Ka_JY85zDv0FFw!jCez0skqO5U1NsRZvHnaICWxGSbLfT6GZJAOg0B1f5d3Lz8X1-z8lJIxHnIMsGqsiFR8b00aOSnW8mEFaPE_yhu2oY0$> Since this script just analyzes the code and doesn't run it, I think it will be difficult to test randomness of a problem by evaluating the problem for multiple seeds. We've talked about doing this at some point. We could add a check for calling random or related methods within a problem. However, the following $a = random(1,10); $ans = Real(5); BEGIN_PGML [_]{$ans} END_PGML which would detect as using random, but won't be random. — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/openwebwork/pg/pull/1254*issuecomment-3045503860__;Iw!!Ka_JY85zDv0FFw!jCez0skqO5U1NsRZvHnaICWxGSbLfT6GZJAOg0B1f5d3Lz8X1-z8lJIxHnIMsGqsiFR8b00aOSnW8mEFaPE_yhu2oY0$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABEDOAGNKWNKYQXE6XEUJTL3HKCPPAVCNFSM6AAAAAB7ZA23HKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTANBVGUYDGOBWGA__;!!Ka_JY85zDv0FFw!jCez0skqO5U1NsRZvHnaICWxGSbLfT6GZJAOg0B1f5d3Lz8X1-z8lJIxHnIMsGqsiFR8b00aOSnW8mEFaPE_fyNyp7Y$> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

dlglin · 2025-07-07T16:37:44Z

A simple first run would be to check for the word random. If it's not present then display a message like "it looks like this problem might not be randomized". This will obviously miss a bunch of problems that are not truly randomized, and might have some false positives as well, but it's easy to implement.

pstaabp · 2025-07-08T18:13:16Z

Just added a check for randomness by detecting functions of the form random(, list_random( or random_ which covers things like random_subset and random_coprime.

drgrice1

I think that at this point, this and openwebwork/webwork2#2748 will need to be deferred until the next release. There are a lot of problems with the current code. The webwork2 pull request has many issues including things like invalid HTML. Both pull requests need code clean up.

The most serious issue though is with this pull request. We need to make sure that issues (both positive and negative) are correctly detected. It would be really bad if a positive aspect were detected as a negative aspect or vice versa, and that is happening at least to some extent.

Note that your code skips full line comments, but not code does not exclude comments at the end of a line. That means there could be incorrect results from these end of line comments.

Any problem that uses macros like parserPopUp.pl, parserRadioButtons.pl, or parserCheckboxList.pl with randomization of the choices but does not have any other random call in the problem are currently flagged as not having "randomness". That will definitely need to be fixed before this can be rolled out, since there are a lot of problems that use those macros. I think in general the randomness check will need to be done quite differently in order to work. I think that looking for the strings "random(" or "random_...(" or "random_list(" is not going to work. Those strings could be contained in an end of line comment and the current check will count it. Even if the end of line comment issue is fixed, these words could appear in an actual string in the code and the check will count that also.

Also, something is now wrong with the bin/pg-critic.pl script. All problems are returning a score of 0. This is because you must have changed from using "good" and "bad" in the PGProblemCritic.pm file to using "positive" and "negative", but didn't change the pg-critic-.pm script accordingly. Even with that fixed there are issues because of other changes in the module not in the script (particularly the addition of "randomness"). In general that brings into question the techniques used. The general data structure is not good.

In general, I don't think that this approach of parsing the code as text is going to work. A more versatile and reliable approach is needed.

Added extra features to the critic.

Also updated the POD

pstaabp · 2025-07-14T20:16:28Z

Switched this to develop. Will continue to work on it.

This uses `Perl::Critic` and custom PG policies for `Perl::Critic` to analyze the code. The custom PG policies must be under the `Perl::Critic::Policy` to be loaded by `Perl::Critic` (they give no alternative for that). That means they are in the `lib/Perl/Critic/Policy` directory. Policies corresponding to everything that was attempted to be detected in openwebwork#1254 have been implemented except for `randomness`. `randomness` of a problem is far more complicated than just checking if `random`, `list_random`, etc. are called. Basically, the code of a problem is first translated (via the `default_preprocess_code` method of the `WeBWorK::PG::Translator` package), then converted to a `PPI::Document` (the underlying library that `Perl::Critic` uses), and that is passed to `Perl::Critic`. There are some utility methods provided in the `WeBWorK::PG::Critic::Utils` package that can be used by the PG policies. At this point those are `getDeprecatedMacros`, `parsePGMLBlock`, and `parseTextBlock`. The `getDeprecatedMacros` method just lists the macros in the `macros/deprecated` directory. The `parsePGMLBlock` method parses PGML contents, and actually uses PGML::Parse for the parsing, and returns `PPI::Document` representations of the content. At this point only command blocks are returned (perl content of `[@ ... @]` blocks), but more can be added as needed by the policies that are created. The `parseTextBlock` method is similar but parses `BEGIN_TEXT`/`END_TEXT` blocks (and the ilk) using a simplified `ev_substring` approach. At this point only the contents of `\{ ... \}` blocks are returned, and other elements can be added later if needed. Unfortunately, the `parsePGMLBlock` and `parseTextBlock` methods do not give proper positioning within the code, so the line and column numbers of the things in the return value will not be reliable. The only policy that uses these at this point is the `Perl::Critic::Policy::PG::RequireImageAltAttribute` policy and that just reports the violations as being inside the PGML or text block the violations are found in. Also, the original untranslated code is passed to the policies and can be used if needed. The `Perl::Critic::Policy::PG::ProhibitEnddocumentMatter` is the only policy that uses this at this point. Note that since this is just `Perl::Critic` this also reports violations of the core `Perl::Critic` policies (at severity level 4). However, there are policies that clearly don't apply to PG problem code, and so those are disabled. For instance, obviously `use strict` and `use warnings` can't be called in a problem, so the `Perl::Critic::Policy::TestingAndDebugging::RequireUseStrict` and `Perl::Critic::Policy::TestingAndDebugging::RequireUseWarnings` policies are disabled. The disabled policies start at line 57 of the `WeBWorK::PG::Critic` package. This may need tweaking as there may be other policies that need to be disabled as well, but those are the common violations that I have seen over the years using this for problems that should not apply to problems (I have used a form of this PG critic without the custom PG policies for some time now -- see https://github.com/drgrice1/pg-language-server). Also note that since this is just `Perl::Critic`, you can also use `## no critic` annotations in the code to disable policy violations for a specific line, the entire file, a specific policy on a specific line, etc. See https://metacpan.org/pod/Perl::Critic#BENDING-THE-RULES. For example, if you have a problem that is in the works and are not ready to add metadata, then add `## no critic (PG::RequireMetadata)` to the beginning of the file, and you won't see the violations for having missing metadata. Note that the `bin/pg-critic.pl` script has a `-s` or `--strict` option that ignores all `## no critic` annotations, and forces all policies to be enforced. The result is a reliable, versatile, and extendable approach for critiquing problem code. Since there was a desire to have a "problem score" and to reward good behavior that has been implemented. That means that not all "violations" are bad. Some of them are good. The score is implemented by setting the "explanation" of each violation as a hash which will have the keys `score` and `explanation`. The score will be positive if the "violation" is good, and negative otherwise. The `explanation` is of course a string that would be the usual explanation. This is a bit of a hack since `Perl::Critic` expects the violation to be either a string or a reference to an array of numbers (page numbers in the PBP book), but the `explanation` method of the `Perl::Critic::Violation` object returns the hash as is so this works to get the score from the policy. Although, I am wondering if this "problem score" is really a good idea. If we do start using this and make these scores public, will a low score on a problem deter usage of the problem? It seems like this might happen, and there are basic but quite good problems that are going to get low scores simply because they don't need complicated macros and code for there implementation. Will a high score really mean that a problem is good anyway? What do we really want these scores for? Some sort of validation when our problems get high scores because they utilize the things that happen to be encouraged at the time? I am thinking that this "problem score" idea really was NOT a good idea, and should be removed. If the score is removed, then there is also no point in the "positive violations". Those simply become a "pat on the back" for doing something right which is really not needed (in fact that is all they really are even with the score in my opinion). So my proposal is to actually make this a proper critic that just shows the things in a problem that need improvement, and remove the score and the "positive violations". That is in my opinion what is really important here.

pstaabp mentioned this pull request Jun 20, 2025

Add the pg-critic problem analyzer to the PGEditor openwebwork/webwork2#2748

Open

pstaabp force-pushed the pg-analysis branch 2 times, most recently from 4ed2649 to cb5e280 Compare June 20, 2025 19:33

drgrice1 reviewed Jun 20, 2025

View reviewed changes

pstaabp force-pushed the pg-analysis branch 2 times, most recently from 8e69136 to 3756e61 Compare June 21, 2025 02:35

pstaabp force-pushed the pg-analysis branch from 2b38775 to 2b634a8 Compare June 27, 2025 21:01

pstaabp force-pushed the pg-analysis branch from 2b634a8 to e24b1d6 Compare July 8, 2025 18:10

drgrice1 requested changes Jul 9, 2025

View reviewed changes

pstaabp changed the base branch from PG-2.20 to develop July 14, 2025 19:59

pstaabp force-pushed the pg-analysis branch from e24b1d6 to f3a3fcf Compare July 14, 2025 20:13

pstaabp added 5 commits July 14, 2025 16:15

Initial checkin of PG critic script.

80b3625

Update the executable bit of the file.

3ef45d5

Handle loadMacros arguments in the form q{ } and related.

7a221d9

Improvements on the PGcritic

f586d45

Added extra features to the critic.

Add randomness to the list of criteria to be checked.

4702f4c

Also updated the POD

pstaabp force-pushed the pg-analysis branch from f3a3fcf to 4702f4c Compare July 14, 2025 20:15

drgrice1 mentioned this pull request Jul 16, 2025

Add a PG Critic for checking that PG code conforms to best-practices in problem authoring. #1278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Creation of a PG critic script. #1254

Creation of a PG critic script. #1254

Uh oh!

pstaabp commented Jun 20, 2025

Uh oh!

drgrice1 left a comment

Uh oh!

pstaabp commented Jun 27, 2025

Uh oh!

duffee commented Jul 6, 2025

Uh oh!

pstaabp commented Jul 7, 2025

Uh oh!

Alex-Jordan commented Jul 7, 2025 via email

Uh oh!

dlglin commented Jul 7, 2025

Uh oh!

pstaabp commented Jul 8, 2025

Uh oh!

drgrice1 left a comment •

edited

Loading

Uh oh!

pstaabp commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

Creation of a PG critic script. #1254

Are you sure you want to change the base?

Creation of a PG critic script. #1254

Uh oh!

Conversation

pstaabp commented Jun 20, 2025

Uh oh!

drgrice1 left a comment

Choose a reason for hiding this comment

Uh oh!

pstaabp commented Jun 27, 2025

Uh oh!

duffee commented Jul 6, 2025

Uh oh!

pstaabp commented Jul 7, 2025

Uh oh!

Alex-Jordan commented Jul 7, 2025 via email

Uh oh!

dlglin commented Jul 7, 2025

Uh oh!

pstaabp commented Jul 8, 2025

Uh oh!

drgrice1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstaabp commented Jul 14, 2025

Uh oh!

Uh oh!

drgrice1 left a comment •

edited

Loading